This document details the data and files needed to replicate "How Protester and State Violence Accept Protest Dynamics".

For each script's output, I have indicated the corresponding table or figure name it produces.  These names do not necessarily correspond with the filename of the figure or table.

########################################################
########################################################
#
# 	1. DATA
#
########################################################
########################################################
1. Data/01_rawData/
	- To protect user anonymity, the UCLA IRB forbids the release of these data.

2. Data/02_processedData/*.csv
	- Aggregated variants of the raw tweets.  The scripts detail how files in 01_rawData/* become 02_processedData/*.csv

3. Data/eventDataComparison/*.csv
	- Data from ICEWS, ACLED, and MMAD for Table 4.

4. Data/manualProtestCoding/*.csv
	- Data created by research assistants to verify the Twitter estimates.

########################################################
########################################################
#
# 	2. Scripts
#
########################################################
########################################################
1. Scripts/replication_01_read_raw_tweets.py
	- This script adds data to the raw tweets.
	- These are from Donghyeon Won, a research assistant no longer with us.
	- Input:
		- ./Data/01_rawData/<CountryCode>_tweet_metadata.csv
		- ./Data/01_rawData/tweets_output_new/<CC>.csv
	- Output:
		- Data/01_rawData/02_DonghyeonAlexMerged_<CC>_Threshold826.csv

2. Scripts/replication_02_ProcessRawData.Rmd.
	- This script merges the raw data by country, assigns binary classifications from classifier output, based on thresholds.  It uses another dataset of raw tweets provided by Alex Chan.
	- The shortSpain data are raw tweets with facial metadata added by Alex Chan.
	- These are from Alexander Chan.
	- Input:
		- ./Data/01_rawData/shortSpain_prediction_face_screened_s50_c1_7453.csv
	- Output:
		- ./Data/02_processedData/a_DonghyeonAlexmerged_NewclassifiersWithFaces.csv
		- ./Data/02_processedData/b_DonghyeonAlexmerged_NewclassifiersWithBinary.csv
		- ./Data/02_processedData/c_DonghyeonAlexmerged_NewclassifiersShortSpain.csv

2b. Scripts/replication_02b_ProtestImageAnalysis_Botometer_v1.py
	- This script submits each user to Botometer, gets its rating, puts that rating in per tweet, and then determines if bot or not.
	- Documentation on how to use: https://rapidapi.com/OSoMe/api/botometer/details?utm_source=mashape&utm_medium=301.
	- Documentation on how to use: https://github.com/IUNetSci/botometer-python
	- Input:
		- ./Data/02_processedData/c_DonghyeonAlexmerged_classifiers_shortSpain.csv
	- Output:
		- ./Data/02_processedData/z_botometerResults.txt
		- .Data/02_processedData/c2_DonghyeonAlexmerged_classifiers_shortSpain.csv

3a. Scripts/replication_03a_InvestigateBots_v1.R
	- The purpose of this script is to investigate the bot results from 02b.  It will generate histograms by country as well as table of likely bots by country.  For Supplementary Materials.
	- Input:
		- .Data/02_processedData/c2_DonghyeonAlexmerged_classifiers_shortSpain.csv
	- Output:
		- Table A13.

3b. Scripts/replication_03b_AggregateRawTweets.py
	- Aggregate tweets to city, state, and country day.
	- Make subsets for robustness, aggregate those subsets.
	- The output is aggregated and therefore available for replication.
	- Input:
		- ./Data/02_Data/02_processedData/c2_DonghyeonAlexmerged_classifiers_shortSpain.csv
		- ./Data/02_processedData/c_DonghyeonAlexmerged_Newclassifiers_ShortSpain.csv
		- ./Data/02_processedData/c2_DonghyeonAlexmerged_classifiers_ShortSpain_dedupDetect.csv
			- This dataset is raw tweets with an indicator for if they are duplicates.  Used to keep only first duplicate.
	- Output:
		- ./Data/02_processedData/d_DonghyeonAlexmerged_cityday_withNewClassifierOutput' + <file_modifier> + '.csv
			- file_modifier are things like "verifiedAccounts", "dominantLanguage", etc. that indicate what subset the dataset contains.


4. Scripts/replication_04_ProcessFromPythonAggregation.Rmd
	- From the Python aggregation, this script adds number of protesters based on images, missing days, recalculates lags after adding the missing days, and adds weekday labels to dates
	- Input:
		- City, state, and country aggregations by hour and day
		- Robustness subsets as well (only mobile, only dominant language, etc.)
	- Output:
		- ./Data/02_processedData/e_<country/state/city>day_UsersAndMissingDays_<file_modifier>.csv
			- Same processed output for the robustness aggregates.  file_modifier takes the same endings as in replication_03b

5. Scripts/replication_05_MainWork.Rmd
	- Makes the regressions, robustness checks, and some summary statistics.
	- I did not consistently name the output to match its number in the paper.  When appropriate, the script contains section and subsection titles to indicate which of the paper's figure or table is created.
	- When using Stargazer, standard errors would often not get printed.  I therefore also printed the model output in the console and manually inputted those standard errors.  The values are the same, I just could not figure out how to make Stargazer print them.
	- Input:
		- ./Data/02_processedData/e_DonghyeonAlexmerged_cityday_UsersAndMissingDays.csv
		- The equivalent processed output for the robustness aggregates.
	- Output:
		- Data/02_processedData/f_eParsedForMainRegressions.csv
		- Paper figures:
			- Figures 1a-c are not produced in any script.  I used Powerpoint to align images downloaded from tweets, saved the slide as a .pdf, cropped the .pdf, and inserted that .pdf for each panel.
			- Figures 3a-3b
			- Figure 4
			- Figures 5a-5d
			- Appendix Figures A11, A18, A19, A20
		- Tables:
			- Table 2 was made by hand.
			- Table 5.
			- Table 6.
			- Table 7.
			- Appendix Tables A4, A5, A6, A7, A8, A9, A10, A11, A12


6. Scripts/replication_06_DescriptiveStatistics_Graphs.Rmd
	- The purpose of this script to generate descriptive figures for the paper.
	- It generates the protest size and state violence by day figures.
	- Input:
		- ./Data/02_processedData/countryday_withClassifierOutput_avgUsers_Cleaned.csv
	- Output:
		- ./Users/Zack/Dropbox/Twitter_ViolenceImagesFromDonghyeon/Figures/<CC>_filedescription.pdf
		- Figures:
			- Figures 2a-2d
			- Figures A13a-d

7. Scripts/replication_07_Table3FigA12.R
	- This script creates creates the values for Table 3; I manually created the table in Latex.
	- Input:
		- Manually created datasets
		- Data/02_processedData/e_DonghyeonAlexmerged_cityday_UsersAndMissingDays.csv
	- Output:
		- Table 3
		- Figures A12a-d

8. Scripts/replication_08_Table4.py
	- This script responds to replicates Table 1 from Zhang and Pan 2019.  It creates the values for Table 3; I manually created the table in Latex.
	- Input:
		- ./Data/eventDataComparison/eventData_CASMTable1_mmad.csv
		- ./Data/eventDataComparison/eventData_CASMTable1_icews.csv
		- ./Data/eventDataComparison/eventData_CASMTable1_acled.csv
		- ./Data/02_processedData/f_eParsedForMainRegressions.csv
	- Output:
		- The raw numbers used in Table 4.  This script does not actually create Table 4.  The creation was done manually in the .tex using the numbers from this script.


9. Scripts/replication_SM_01_FigA9A10.R
	- Produces Figures A9, A10.
	- NB: Requires raw data.
	- Input:
		- ./Data/02_processedData/c_DonghyeonAlexmerged_Newclassifiers_ShortSpain.csv
		- ./Data/facesValidation/Labeling/Step2_FaceValidation_Bernard.csv
		- ./Data/facesValidation/Labeling/Step2_FaceValidation_Jack.csv
		- ./Data/facesValidation/Labeling/Step2_FaceValidation_Jun.csv
	- Output:
		- ./Figures/AJPS_Reject_Fall2019/classifierValidation_barchart.jpg
		- ./Figures/faceValidation_loess.jpg

10. Scripts/replication_SM_02_FigA15.R
	- Produces Figure A15.
	- Note that it includes raw data.  I have included the output .csvs which are used to make Figure A15 in ./Data/comparingUsers/
	- Input:
		- ./Data/comparingUsers/tweets_with_images.txt
		- ./Data/comparingUsers/tweets_with_protest_images.txt
		- ./Data/comparingUsers/tweets_with_images.txt
		- ./Data/comparingUsers/tweets_with_protest_images.txt
	- Output:
		- ./Data/comparingUsers/summaryStats_images_noprotest.csv
		- ./Data/comparingUsers/summaryStats_images_noprotest_se.csv
		- ./tables/summaryStats_images_noprotest.csv
		- ./Data/comparingUsers/summaryStats_images_protest.csv
		- ./Data/comparingUsers/summaryStats_images_protest_se.csv
		- ./tables/summaryStats_images_protest.csv

11. Scripts/replication_SM_03_FigA16.R
	- Products Figure A16.
	- Note that it includes raw data.
	- Input:
		- ./Data/02_processedData/c_DonghyeonAlexmerged_Newclassifiers_ShortSpain.csv
		- ./Data/Charlottesville/5_finalData_v2.csv
		- ./Data/Charlottesville/ConvNeurNet_Results
	- Output:
		- ./Figures/biasByAgg_totalFaces_wSummary.jpg
		- ./Figures/biasByAgg_stateviolence_wSummary.jpg
		- ./Figures/biasByAgg_protesterviolence_wSummary.jpg
		- ./Figures/biasByAgg_cville3.jpg

12. Scripts/replication_SM_04_Table14FigA22.R
	- Produce Table A14, Figure A22.
	- Note that it includes raw data.
	- Input:
		- ./Data/02_processedData/c2_DonghyeonAlexmerged_classifiers_ShortSpain_dedupDetect.csv
	- Output:
		- ./Figures/DuplicatesHistogram_pooled.jpeg
		- ./Tables/deduplication_table_byCity.tex
